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An important step in building a quantum computer is calibrating experimentally implemented 
quantum gates to produce operations that are close to ideal unitaries. The calibration step involves 
estimating the systematic errors in gates and then using controls to correct the implementation. 
Quantum process tomography is a standard technique for estimating these errors, but is both time 
consuming, (when one only wants to learn a few key parameters), and is usually inaccurate without 
resources like perfect state preparation and measurement, which might not be available. With the 
goal of efficiently and accurately estimating specific errors using minimal resources, we develop a 
parameter estimation technique, which can gauge key systematic parameters (specifically, amplitude 
and off-resonance errors) in a universal single-qubit gate-set with provable robustness and efficiency. 

In particular, our estimates achieve the optimal efficiency, Heisenberg scaling, and do so without 
entanglement and entirely within a single-qubit Hilbert space. Our main theorem making this 
possible is a robust version of the phase estimation procedure of Higgins et al. [10] . 


I. INTRODUCTION 

Not all errors in a quantum computation experiment 
are created equal. There are actually two broad classes 
of error, unitary errors, also known as systematic er¬ 
rors, and nonunitary errors, also known as decoherence. 
Both sets of errors need to be corrected below a cer¬ 
tain threshold for scalable quantum computation to take 
place [1, 26]. Correcting systematic errors, such as over¬ 
rotation or off-resonance errors, is typically regarded as 
the easier task; because these errors are directly related 
to the controls available to an experimenter, they can be 
directly corrected by changing those controls. In this re¬ 
spect systematic errors contrast with decoherence, which 
is typically less affected by an experimenter’s control 
software and more influenced by imperfect or nonideal 
hardware. 

However, even though systematic errors are consid¬ 
ered the easier of the two to correct, calibrating gates 
in a quantum computer to reduce systematic errors can 
still take hours even for modest system sizes, and more¬ 
over this calibration may have to be repeated every time 
the quantum computer is switched on [19]. Not only 
can this process be inefficient in terms of the precision 
of the estimates with respect to time, but standard tech¬ 
niques for estimating systematic errors often suffer from 
measurement bias, leading to inaccurate estimates [6]. 

To characterize systematic errors, quantum process to¬ 
mography [7] has long been a valuable tool in the ex¬ 
perimental toolkit. However, standard techniques [7] 
require perfect state preparation, perfect measurement, 
and at least some perfect gates. Especially during the 
calibration stage of an experiment, it is unreasonable to 
assume access to such perfect resources, and, without 
them, standard process tomography results in a difficult 
nonlinear estimation problem [29-31], and hence the es¬ 
timates obtained using this technique are typically in¬ 
accurate. Moreover, systematic errors are controlled by 


a few key parameters, but unless the measurement ba¬ 
sis of the tomography procedure is specialized, e.g. [2], 
to extract those few important parameters can require 
resources that scale exponentially with the size of the 
system and can be time consuming even for single qubit 
processes. 

Recent approaches aim to circumvent the stringent 
requirements of standard tomography. Randomized 
benchmarking (RB) [16, 20, 22], randomized bench¬ 
marking tomography (RBT) [15], and other tools based 
on randomized benchmarking [33, 34] can characterize 
quantum error processes even when nothing is known 
about state preparation and measurement. However, 
these procedures require access to relatively good Clif¬ 
ford operations [9, 21]. In addition, other than certain 
key parameters like the average fidelity, single parame¬ 
ters cannot be extracted efficiently. While the average fi¬ 
delity can be learned efficiently using RB, average fidelity 
gives no information about the nature of the systematic 
errors on the gates, and so is useless for experimental¬ 
ists who would like to use tomographic data to correct 
systematic errors. 

Another promising approach is gate-set tomography 
(GST) [4, 23]. GST makes no assumptions about state 
preparation, measurement, or processes, while still ob¬ 
taining accurate estimates. However, GST is even more 
inefficient than standard tomography, since to learn even 
a single parameter, one must fully characterize a com¬ 
plete gate-set along with state preparation and measure¬ 
ment. 

We propose a new procedure to estimate simultane¬ 
ously all the systematic errors in a universal single-qubit 
gate-set. This procedure falls in between existing proto¬ 
cols in terms of required resources and assumptions, but 
is optimal in terms of asymptotic efficiency. Rather than 
doing full tomography, we extract only parameters that 
correspond to systematic errors, precisely the errors that 
the experimentalists can easily correct. We learn those 
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parameters efficiently and non-adaptively in fact we 
are Heisenberg limited. Like GST, we require no perfect 
resources, and, moreover, we do not require any addi¬ 
tional gates besides the ones we are characterizing. We 
also never require more than a single-qubit Hilbert space. 
In particular, we never need entangled states, like those 
often employed in interferometric phase estimation pro¬ 
cedures [14, 32]. Instead, the source of the quantum 
advantage in our procedure is the exploitation of long 
coherence times of the qubit system, and our ability to 
apply a gate multiple times in series. This allows small 
variations in gates to coherently accumulate into large 
observables. 

Of course, like other Heisenberg limited studies [12, 
17], a finite coherence time ultimately limits the esti¬ 
mation accuracy that we can achieve. However, our 
procedure does retain Heisenberg scaling against state 
preparation errors and measurement errors. Thus, while 
a standard parameter estimation scheme (one that re¬ 
peatedly prepares a state, applies an operation, and then 
measures) is limited by uncertainty in the measurement 
operator, our procedure can obtain Heisenberg-limited, 
arbitrarily precise parameter estimates even with un¬ 
known (but not too large) errors in the measurement 
operator. In this way, our procedure also has some of 
the flavor of randomized benchmarking. 

In order to achieve these gains in efficiency and accu¬ 
racy, we lose some of the flexibility of other procedures. 
Our procedure will fail if errors are larger than some 
threshold amount. Also, the procedure is most useful 
when the experimentalist has precise control over the 
gates, and can undo the systematic errors once they are 
characterized. We hope that a calibration procedure like 
the one we describe could be used to quickly “tune up” 
gates before more sophisticated procedures like RBT or 
GST are employed to characterize non-systematic (de¬ 
coherence) errors. 

Our main theorem says that it is possible to perform 
phase estimation in the presence of errors. In particular, 
we consider additive errors in the measurement proba¬ 
bilities of experiments. This is a fairly straightforward 
idea, but it turns out that many different effects can 
be swept into these additive errors. For example, state 
preparation and measurement errors can be seen as ad¬ 
ditive errors. We show how to do phase estimation in the 
presence of these additive errors and extract two param¬ 
eters of a process, amplitude and off-resonance errors, 
instead of only learning the phase of a rotation, as is 
typical. It turns out that while estimating one of the 
parameters of interest, the effect of the other parameter 
can be thought of as another additive error. Moreover, 
when multiple additive errors occur simultaneously, the 
result is still an additive error, with (worst-case) magni¬ 
tude equal to the sum of the magnitudes of the individual 
additive errors. 

In particular, we modify and improve a non-adaptive 
phase estimation technique of Higgins et al. [10] to show 

Theorem 1.1. Suppose that we can perform two families 
of experiments, |0 )-experiments and \+)-experiments, in¬ 


dexed by k £ Z + , whose probabilities of success are, re¬ 
spectively, 


1 + cos(£:A) 
Po(A,k) = - ^ —- 

+ So(k), 

(1.1) 

1 + sin(fcA) 
P+(A,k) =- ^ - L 

+ 5 + (k). 

(1.2) 

Also assume that performing either of the k th 

experi- 

ments takes time proportional to k, 

and that 


sup { <5 0 (fc) , |5+(ft)|} < 

: l/y/8. 

(1.3) 


k 


Then an estimate A of A £ (—7r,7r] with standard devia¬ 
tion a (A) can be obtained in time T = 0(1/a (A)) using 
non-adaptive experiments. 

On the other hand, if |<$o(fc)| and |<5+(fc)| are less than 
l/y/8 for all k < k* , then it is possible to obtain an 
estimate A of A with cr(A) ~ 0(l/k*) (with no promise 
on the scaling of the procedure). 

More precise bounds on the scaling of standard devi¬ 
ation with time can be found in Section V. 

We call the terms 6o(k) and 6+(k) additive errors. 
While we can only achieve Heisenberg scaling up to arbi¬ 
trary precision when the additive errors have magnitude 
less than l/y/8 for all k, some effects (like depolariz¬ 
ing errors) cause additive errors that grow with k and so 
eventually overwhelm the l/y/8 bound. However, in that 
case, if k* is the k at which the errors become too large, 
our procedure can give an estimate with precision that is 
0(l/k*), which is often better than standard procedures 
which are limited by uncertainty in state preparation and 
measurement. 

The layout of the paper is as follows. First, in Sec¬ 
tion II, we define notation for single qubit operations 
and errors. In Section III we use Theorem 1.1 to cal¬ 
ibrate systematic errors in a single-qubit gate-set, and 
then Section IV discusses the robustness of this proce¬ 
dure to sources of error such as imperfect state prepa¬ 
ration, measurement noise, and decoherence. Finally, 
in Section V we modify and reanalyze the non-adaptive 
Heisenberg limited phase estimation procedure of [10] to 
achieve better scaling and simpler bounds, resulting in 
the proof of Theorem 1.1. 


II. CHARACTERIZING A UNIVERSAL 
GATE-SET 

We consider systematic errors in a universal single¬ 
qubit gate-set. For the moment, we assume that the im¬ 
plemented gates have systematic errors but no decoher¬ 
ence errors, and hence are perfect unitaries. (We relax 
these assumptions in Section IV.) Single-qubit unitaries 
are defined by two parameters: their axis of rotation and 
their angle of rotation in the Bloch sphere. (See [25] for 
background on the Bloch sphere.) 
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Two unitary gates are sufficient to create a universal 
single-qubit gate-set. We describe a scheme to charac¬ 
terize a gate-set where the two gates are ideally orthog¬ 
onal. In particular, we consider the case that one gate 
is a faulty implementation of Z „/ 2 , a 7t/2 rotation about 
the Z -axis of the Bloch sphere, and the other gate is 
a faulty implementation of X„./ 4 , a 7r/4 rotation about 
the X-axis. We also assume that the experimenter can 
create an imperfect |0) state, the 1-valued eigenstate of 
Z^/ 2 - How good the gates, state preparation, and mea¬ 
surement must be initially for our procedure to work is 
determined by Theorem 1.1, and will be made clear in 
the calibration procedures in Section III. 

We chose specific rotation angles for our Z and X 
rotations. This choice is mainly for convenience, since 
it turns out that access to (1) imperfect versions of the 
states 


where 0 is the angle of the axis of rotation relative to 
the X-axis, and e is a parameter that quantifies how far 
the implemented angle of rotation is from 7r/4. When 
e = 0 = 0 we have implemented a perfect gate. 

Our goal is to estimate a, 9 , and e, with the expecta¬ 
tion that once these systematic errors have been quanti¬ 
fied, experimentalists can adjust the controls of the gates 
to set their values close to 0. If desired, the process can 
then be repeated - the new values of a, 0, and e can be 
reestimated and readjusted again. 

We will also need notation for a general imperfect X 
rotation: 

(l + «o) 

x (cos(0)Px + sin(0)P z ). (II.4) 


X<f,(e, 9) = cos ^ (1 + e)^ I — i sin 


| 0 >, 


1 +) 


| 0 ) + | 1 ) 
V2 


| 0 )+*| 1 ) 

>/2 


(HI) 


and to (2) a Z„ rotation calibrated to near perfection, 
are sufficient to characterize Z x and X ^ for any rotations 
X and <f> using our techniques. Only calibration of X^ 
requires the second condition. These two conditions are 
satisfied given the gate-set in the previous paragraph. 
Indeed, in an experiment where Z x and X^ are available, 
albeit erroneously, for any % and </>, it would perhaps be 
best to first calibrate Z^/ 2 and X w / 4 rotations so that 
conditions (1) and (2) are satisfied before calibrating Z x 
and X^ for arbitrary \ and cf>. 

We now define our universal gates mathematically. 
Without loss of generality, we can define the Z -axis of 
the Bloch sphere to be aligned with the axis of rota¬ 
tion of our approximate Z w / 2 gate. This means that our 
initial state preparation may not be aligned with the Z- 
axis, but our scheme is robust against this type of error. 
Once the axis of our approximate Z v /2 gate is fixed to 
the Z- axis, the only free parameter is the angle of ro¬ 
tation. Thus, we can write our approximate Z n j 2 gate 
as 

Z„/ 2 (a) = cos Q(1 + a)) I - isin Q(1 + a)) P z , 

(II.2) 


where (Pa - , Py, Pz} are the Pauli matrices, I is the 2x2 
identity matrix, and a is a parameter that quantifies how 
far the implemented angle of rotation is from tt/2. When 
a = 0, we have implemented a perfect gate. 

Likewise, without loss of generality, we define the X- 
axis of the Bloch sphere so that the axis of rotation of 
our approximate X„./ 4 gate lies along the XZ-plane of 
the Bloch sphere. In this case, the approximate X„./ 4 
gate has two degrees of freedom: the location of the axis 
of rotation in the XZ-plane of the Bloch sphere, and 
its angle of rotation. More precisely, we can write our 
approximate X„y 4 gate as 

X^/i{e, 9) =cos(£(l + e))l-» sin (1 + e)) 

x (cos(0)Px + sin(0)Pz), (H.3) 


This expression Xtf,(e,9) represents a rotation that is in 
the XZ plane of the Bloch sphere, which is approxi¬ 
mately a rotation by an angle <fi. In general, the param¬ 
eters e and 9 will depend on <p. 

In some cases, we will apply the unitary operations 
-Wr/ 4 (e, 0) and Z n / 2 (a) to mixed states instead of pure 
states. In this case, we will use cursive letters to repre¬ 
sent the CPTP maps corresponding to these unitaries. 
That is 

X n/4 (e, 0)(p) = X 7r/4 (e, 9)p (X n/4 (e, 9 )) 1 , 

Z v/2 (<*)(p) = Z n/2 (a)p (^/ 2 (a)) t , (II.5) 

where f denotes the conjugate transpose. 

We use the notation Z^/ 2 {a) k to mean k repeated 
applications of Z v / 2 (a). Unitaries act right to left, so 
Z n / 2 {a)X n / 4 (e, 9) means apply the X-rotation first, and 
then the Z-rotation. 


III. SEQUENCES FOR ESTIMATING 
SYSTEMATIC ERRORS 


In this section, we describe sequences consisting of uni¬ 
taries Z 7T / 2 (a) and X w / 4 (e, 0), which can be used to es¬ 
timate the systematic error parameters a, e, and 0. In 
particular, we would like to obtain observables po(a , k), 
P+(a, k), po(9, k), p+(9, k), p 0 (e, k), and p+(e, k), as de¬ 
scribed in Theorem 1.1. By Theorem 1.1, such observ¬ 
ables will allow us to accurately estimate a, e and 0 
as long as the additive errors associated with these ob¬ 
servables are not too large. We address the problem of 
initially bounding additive errors in Appendix C. 

In this section, we assume that we can prepare the 
states |0), |+), and |—>) perfectly, and that we can mea¬ 
sure (perfectly) the probability of being in the state |0), 
or the probability of being in the state |+). In Section 
IV, we introduce state preparation and measurement er¬ 
rors to our protocols. 
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A. Estimating a 


With the assumption of perfect state preparation and 
measurement, we can estimate a using standard phase 
estimation, without having to resort to robust phase es¬ 
timation. One can verify that 


|(+|^/ 2(«) fc |+)| 2 

|(+|^ / 2 ( a ) fc |^}| 2 


1 + cos (—fcf (1 + a)) 

2 ’ 

1 + sin (—fcf (1 + a)) (mi) 


Comparing with Eqs. 1.1 and 1.2, we see these sequences 
can be used to estimate a. If N is the number of times 
we apply Z n / 2 (a), by Theorem 1.1, we can obtain an es¬ 
timate of a with with standard deviation 0{1/N). This 
is what is meant by Heisenberg scaling or Heisenberg 
limited. (N is the most natural and unambiguous mea¬ 
sure of resource consumption for phase estimation; see 
the appendix of [10]). 


B. Estimating e 

We next describe the sequences used to estimate e. In 
this section, for ease of explication later in the paper, we 
will characterize the general gate A</e, 9) 1 where we can 
always substitute 7t/4 for the variable (f> to obtain the 
results relevant to X^/^e, 9). Let </> e = </>(l + e). Again, 
a simple calculation shows that 

Ko|**M)‘H»l 2 = 1 + ^W.) W (*£) si »^, 

|{0|X^(e, e ) k \-^)| 2 - 1 _ gin sin 2 

(III-2) 

Comparing with Eq. (1.1), we see this sequence al¬ 
lows us make a measurements with success probabilities 
Po/+ W> £ , k), with \S 0 (k)\, |5 + (fc)| < sin 2 (6>). 

By Theorem 1.1, as long as |0| is less than about 
36°, (along with our current assumptions of perfect state 
preparation and measurement) then we can estimate c/> e , 
and hence e (assuming a constant </>), with standard de¬ 
viation 0(1/N), where N is the total number of times 
X<j,{e,9) is used over the course of the protocol. 

In Appendix C, we show how to independently bound 
the size of 0, in order to determine if |0| is small enough 
to apply this protocol. 


C. Estimating 0 

We now discuss sequences to estimate 9. For the mo¬ 
ment, we assume that after estimating a, we are able to 
set a = 0 exactly. In Section IV A we will examine what 
happens to this protocol when a is not zero. 

Consider the rotation 

U = ^ /2 (O)A V4 (e,0) 4 ^ /2 (O) 2 A V4 (e,0) 4 ^ /2 (O). 

(HL3) 


Then, because any single-qubit unitary can be written as 
a rotation of some angle <f> about an axis n in the Bloch 
sphere, we may write 

(|)n-(P*,P F ,P z ). (HI.4) 

By direct expansion, we find that the Y -component of 
n is zero and 


U = cos ( — ) I — i sin 


n Y 


nz 



cos (9) cos (?r) 

\J 1 — sin 2 6cos 2 (^r) 

sin (f) 

y/l — sin 2 9 cos 2 (?r)' 


(III.5) 

(HL6) 


2 sin(d) cos 



1 — sin 2 (0) cos 


2 



(III.7) 


We define the angle 0 to be such that cos(0) = nx and 
sin(0) = nz- Using our notation of Section II, we may 
write U = A$(O,0), and hence, using the techniques 
of Section IIIB, we can obtain a Heisenberg limited es¬ 
timate of $ as long as |0| is not too large. All that 
remains is to show that an estimate of allows us to 
estimate 9 with similar precision, and that 0 is not too 
large. 

We have 


|©| = arcsin |n^| 


= arcsin 


sin(7re/2) 

^ 1 — sin 2 0cos 2 (^) 


(III.8) 


which implies sin 2 0 scales as 0(e 2 ). In particular, if 
sin 2 9 < I/a/ 8, as is necessary for estimating e using the 
methods of Section IIIB, then |e| < 0.341 is sufficient 
for estimating $. We can independently verify whether 
|e| is small enough for the protocol to succeed using the 
techniques of Appendix C. 

We now show that estimating $ is sufficient to esti¬ 
mate 9. We have 


• $ „ . . 7re 

sin — =2 sin 9 cos — 
2 2 



(HL9) 


which can be expanded, assuming small 9 , as 

(J) 7j m f 

sin — = 20 cos — + 0(9 3 ). (III.10) 

Since e can be estimated from Section IIIB, we can es¬ 
timate 


sin(<f>/2) 

2 cos(7re/2) 


(Hill) 


As long as e and 9 are not too large, the relationship 
between 4> and 9 is very close to linear, so if we know 
the standard deviation cr(<I>) of $, our estimate of <£, we 
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can obtain the standard deviation of our estimate of 9, 
a(9) as 


a{9) < 


4cos(7re/2) ’ 


(III.12) 


Since we can estimate $ with Heisenberg limited uncer¬ 
tainty, this means we can estimate 9 with Heisenberg 
limited uncertainty. 

In the case that the relationship between 9 and $ is not 
close to linear (which can be checked using Eq. (III.9)) 
then while our technique gives a bound on the variance 
of our estimate of because we don’t know the form 
of the distribution of this estimate, we can not easily 
bound the variance of our estimate of 9. In this case, 
we recommend using non-parametric bootstrapping [8], 
which, at the cost of a constant multiplicative overhead, 
can be used to estimate the variance of the estimate of 9 
obtained from this procedure, without any assumptions 
on a linear relationship between $ and 9. While it is pos¬ 
sible that this non-linearity would keep the estimate of 
9 from being Heisenberg limited, as long as the variance 
of our estimate of <f> is small, the relationship between 
$ and 9 should be approximately linear, and so we ex¬ 
pect that we will always be Heisenberg limited in our 
estimate. 

In Section II, we claimed that our techniques can be 
applied to characterize Z x (a ) and X^(e, 9) for arbitrary 
X and (f>. Our techniques immediately extend to give 
estimates of a and e for these rotations, but it may not 
be immediately clear how to obtain an estimate of 9 in 
this case. The procedure is quite straightforward. First, 
choose a positive integer q such that q<f> = tn for an odd 
integer t. 1 Construct 

U+ = Z V2 (O)X 0 (e,0)^ w/2 (O) 2 X 0 (e,0)^ V2 (O). 

(III.13) 


Using the same procedure as before, we can then esti¬ 
mate 9, assuming \te\ is not too large (|fe| < 0.341 is 
sufficient if sin 2 (0) < 1 /x/8) - 


IV. BOUNDING AND QUANTIFYING OTHER 
ERRORS 


We will completely restrict ourselves to a Hilbert space 
of dimension 2. (So we assume all states and operators 
exist and act only on this subspace.) Let Pos{ 2) be the 
set of positive semidefinite operators on the Hilbert space 
of dimension 2. By A ^ B, we mean A — B is positive 
semidefinite. Consider a general scenario in which we 
would like to prepare a state p, apply a CPTP map £ 
(which might be a sequence of gates), and then measure 
with the POVM W = {W ±,..., Wk}- Then the proba¬ 
bility of obtaining outcome i is 


Pi = tr (Wi£(p)) . (IV.l) 

Suppose, however, that instead of preparing the state p 
perfectly, we prepare the faulty state p', apply the faulty 
CPTP map £' and measure using the faulty POVM 
W' = {W [,..., W' k }. In this case, the probability of 
obtaining outcome i is 

p'=tr(W'£'(p'))- (IV. 2 ) 

Since we care about additive errors, which are a differ¬ 
ence in probability between the desired experiment and 
the implemented experiment, we would like to bound 

\Pi - Pa¬ 
using the triangle inequality, we have 

I Pi ~ p\\ =1 U ( Wi£(p )) - tr ( Wi£'(p )) | 

+ \tv{W i £'{p))-tx{W' i £\p))\ 

+ | tr (Wl£’(p)) - tr (W-£'(p')) |. (IV.3) 

Thus the difference in experimental outcome can be split 
into separate contributions due to gate error, measure¬ 
ment error, and state preparation error. 

In particular, measurement error is bounded by 

<5wi,w/ = max | tr ((W* - W')p) |, (IV.4) 

pGPos( 2) 
tr(p) = l 

state preparation error is bounded by 

W = max | tr(W(p - p) \ = \\\p~ p'||i, (IV.5) 

WePos{ 2) Z 

W^l 

where || • ||i is the l\ norm or “trace distance” (see [25]), 
and the gate error is bounded by 2 


In Section III, we showed how to construct sequences 
such that, if states are prepared perfectly, measurements 
are performed perfectly, and the gates are exactly of the 
form we assume, then one can estimate a, e, and 9 at 
the Heisenberg limit. In this section, we show that these 
assumptions can be relaxed, and examine their effect on 
our protocol. 


$£,£’ = max | tr (W£(p)) - tr (W£'(p)) | 

W,p&Pos{2) 

W^I 

tr(p)=l 

= \ max || £(p) - £'{p)\\i- (IV.6) 

Z pePos( 2) 
tr(p)=l 

In Section IV A we examine the impact of imperfect 
Z rotations on the gate error contribution to additive 


1 It may happen such a q is impossible to find (e.g. if </> = 2tt/3). 
Such cases occur when cf) = (a/b) ix for a/ b a reduced fraction 
and a even. However, letting c = a/2 s be the odd integer part 
of a, calibrating a rotation by ft = (c/6)7r is possible, and a 
rotation by cj) can be obtained by doing 2 s rotations by ft. 


2 We use the bounded rather than completely bounded (diamond) 
norm here because we are restricting our Hilbert space to be of 
dimension 2. 
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errors. In Section IV B, we analyze the effect of depo¬ 
larizing errors on the gate error contribution to additive 
errors. Then in Section IV C we look at state prepara¬ 
tion and measurement errors and their contributions to 
additive errors. 


A. Errors in Z Rotations 


In section III C, we described a unitary operation JJ, 
which involved applying the rotation Z^/ 2(0). Suppose 
that we can’t implement Z n / 2 (0), but instead can im¬ 
plement Z T / 2 (a). Let U' be the gate that results when 
Z-n/ 2(0) is replaced by Z v / 2 (a) in Eq. (III.3). Let U and 
U' label the corresponding CPTP maps. 

Using a similar triangle inequality as in Eq. (IV.3), 
we have that 


|tr (M t (U k -(U') k ) (p)| 

< 2k max ||(Z 7r/2 (0) - Z w/2 (a)) (p)|| 

pePos( 2) 
tr(p) = l 


< 4 k 


sin 


/7ra\ 

uJ 


(IV.7) 


so a non-zero a contributes at most an amount /c7r|o;| to 
5u w ■ For the additive error to be bounded, we require 
\a\=0(l/k). 

In Section III A, we showed that using O(N) appli¬ 
cations of Z„ /2 (a), we could estimate a with standard 
deviation 0(\/N). Assuming that the control of a is pre¬ 
cise enough to correct a to within the uncertainty of 
this estimate, we can obtain a new Z rotation Z 7l / 2 {a') 
with \a'\ = 0(1/N). This improved rotation can them 
be used to implement the protocol for estimating 6 in 
Section IIIC with standard deviation 0(1/N). Notice 
that both procedures (a and 9 estimation) together use 
O(N) applications of gates, so in the end, we can obtain 
an estimate of 9 the scales at the Heisenberg limit. 

In practice, it is unrealistic to assume that experimen¬ 
talists have arbitrarily precise controls, and so at some 
point, even if a is estimated very precisely, it can not 
be corrected. However, in that case, there is no need to 
obtain such a precise estimate, for the very reason that 
it can not be corrected. 

We note that the strategy employed in this section is 
very general, and can be employed for general CPTP 
errors. However, when the errors have certain structure, 
we can do better, as in the case of depolarizing errors, 
which we analyze in the next section. 


B. Depolarizing Errors 

We now consider the effect of depolarizing noise. We 
look at the case that each applied gate is accompanied 
by depolarizing noise A 7 , where 


(IV.8) 


If we have an experiment that involves a sequence of k 
gates, and the probability of a certain outcome assuming 
no depolarizing noise is 1/2 + r (for |r| < 1/2), then in 
the presence of depolarizing noise, the probability of that 
outcome will be 1/2 + 7 k r. This gives a gate error of 

5A 7 = |r|(l- 7 fc )<(l-7 fc )/2. (IV. 9) 

For depolarizing errors with 7 = .99, which is rea¬ 
sonable for many quantum systems, one could go to se¬ 
quences of over 100 operations before the depolarizing 
error would overwhelm the l/y/8 bound of Theorem 1.1. 
Thus if the depolarizing error is small compared to the 
uncertainty in state preparation and measurement error, 
Theorem 1.1 says that our procedure will give more ac¬ 
curate estimates of the parameters of interest than could 
be obtained using standard procedures. 

In fact, in the case of depolarizing errors, because of 
their simple form, one can do better than simply incor¬ 
porating them into additive errors. The procedure of 
Section V can be re-analyzed in the presence of depo¬ 
larizing errors, allowing for more precise bounds. In the 
interest of conciseness and clarity, we relegate this anal¬ 
ysis to later work. 


C. State Preparation Errors and Measurement 
Errors 


State preparation and measurement errors (SPAM) 
are handled very well in general by our procedure. This 
is because SPAM errors contribute a constant additive 
error (<Jm<,m? + 8 PtP >) no matter what gates or operations 
are applied in between state preparation and measure¬ 
ment. As long as these additive errors are not too large, 
our protocol works. However, there is a challenge in 
bounding state preparation errors. Up until this point, 
we have tried to make as few assumptions as possible. 
However, without good gates or good measurements, it 
is very difficult to empirically bound the fiducial state 
preparation error. Therefore, we do have to make an 
assumption: we assume the the experimenter has an up¬ 
per bound on the trace distance between their true state 
preparation P|o)<o| and the ideal state preparation |0)(0|. 
(Once gates have been roughly calibrated, better bounds 
on this distance can then be obtained.) In many experi¬ 
mental set-ups, the prepared state will be extremely close 
to the ideal [13, 24, 27] . We have 

^loxoupioxoi > |||P|0><0| - |0><0|||i. (IV.10) 

Now given the initial state /O|o)(o| and our faulty gates 
Z 7^/2(cr) and V w /4(e,0), we would like to create states 
that are close in trace distance to |+) and | —>). 

We will use the states 

P|+)(+| -2'7r/2 (o) A/74 (g, 0 ) (P|0><0|), 

Pl-X-H =*V4M) 6 (P|0 >( 0|). (IV. 11) 


A 7 (p) = 7P+ (1 -7)1/2. 
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Let £1 = max{e, 9, a} and £2 = max{e, 9}. Then using 
the triangle inequality, one can calculate that 

1 P (i r 4 \ 1 / 4 

2 IIPI+X+I - l+X+Hli <| (y (12 + + tt 2 )J 

+ ^|0)<0|,P|0><0| + 0(£l /4 )> 

1 If 9n 2 \ 1,/4 

+ ( 5 |o)<o|,p |0><0 | + 0 (d /4 )- 

(IV. 12) 

In other words, we can create approximate state prepara¬ 
tions, which induce additive errors of the order of the size 
of the errors in the gates used to create them, plus the 
base additive error from incorrect preparation of | 0 )( 0 |. 

Let W be a measurement operator that is ideally close 
to |0)(0|. In Appendix C we show how to bound 

^|o){o| ,w = max |tr (W p) - tr(|0)(0|p))| (IV.13) 

given access to the state |0)(0| and any other state. As 
usual, if P|o)(o| ' s used instead of | 0 )( 0 |, the difference in 
outcomes will be bounded by <5| 0 ><o|, p| 0 ><oi • 

A rotation similar to what is used in state prepara¬ 
tion can be applied to W to obtain W| + )( + | (an operator 
close to l+X+l), and the additive error for this measure¬ 
ment can be found using the standard triangle inequality 
strategy we have employed multiple times. 


V. NON-ADAPTIVE HEISENBERG LIMITED 
PHASE ESTIMATION 

In this section, we will prove Theorem 1.1. First, in 
Section V A, to set up the main ideas, we review, and 
slightly improve, the proof of Heisenberg scaling without 
additive errors by Higgins et al. [10]. This sufficiently 
motivates our proof in Section VB. 


A. Heisenberg limit without errors 


Our proof of Theorem 1.1 is based on the non-adaptive 
phase estimation procedure of Higgins et al. [10], which 
states 


Theorem V.l. [10] Say that we can perform two 
families of experiments, | 0 )-experiments and |+}- 
experiments, indexed by k £ Z, whose probabilities of 
success are, respectively, 

k) = 1+c " (t/1) , (V.l) 

= (V.2) 

Also assume that performing either of the k th experi¬ 
ments takes time proportional to k. Then, an estimate A 


Po(A, 


of A £ (— 7 r, 7 r] with standard deviation <j(A ) can be ob¬ 
tained in time T = 0(1/cr(A)) using non-adaptive mea¬ 
surements. 


We reprove Theorem V.l because we use new tech¬ 
niques that give improved analytic bounds on the scal¬ 
ing of Ta(A) compared to [10]. These techniques might 
additionally be of broader use. 

For a given k, let do (a+) be the number of successful 
outcomes of the | 0 )- (|+)-) experiments respectively if 
M samples are taken of each experiment. Then one can 
obtain an estimate kA for kA with standard deviation 
a(kA): 

kA = atan2 [a + — M/2, do — M/2} £ (—tt, 7 r], (V.3) 
a(kA) oc 

It is tempting to use this to get an estimate A = kA/k 
for A, apparently with standard deviation 

” {A) “ “ f ’' <v ' 4) 

which gives Heisenberg scaling if M is independent of 
k. Unfortunately, this estimate is deceptive as it is only 
correct up to factors of ^pr-, n £ Z, due to the unknown 
principle range of kA. 

To determine the correct range of kA/k while still re¬ 
taining Heisenberg scaling, Higgins et al. instead sample 
distributions with a range of values of k. In particular, 
they choose k from {fci,..., kx }, with kj = 2 J . Let 
Aj = kjA/kj be an estimate of A obtained from setting 
k = kj. Then A± is used to restrict estimates Aj for 
j > 1 to the range (A\ — ir/2, A\ + 7 r/ 2 ]. Continuing in 
this way, we assume Aj + 1 £ (Aj—-K/2\ Aj+7 t/2 j "]. (This 
restriction differs slightly from Higgins et al., in which 
they assume Aj + 1 £ (Aj — 7 r/3 J , Aj + 7 r/ 3 J ]. This small 
difference allows us to apply much stronger bounds to 
the probability of failure at any step.) 

We immediately see that Ak will only be in the cor¬ 
rect principle range conditional on all prior estimates Aj 
being within ± 2 §- of the actual value of A. In other 
words, the probability 


Vm' 


Perrorikj-A') — P kj(Aj 


A)>\\/kj(Aj-A)<-^ 

(V.5) 


must be small for all j, where the average is taken over 
possible estimates kjAj. (We define p error (fcj A) as stated 


instead of as P 



A )\>1 


in order to obtain 


slightly better bounds.) Any one such error occurring 
will lead to an incorrect range of Ak and thus an incor¬ 
rect estimate of A. As the precise value of p e rror has a 
significant impact in evaluating the scaling constant of 
cr(A) = 0(y), a careful bound on p e rror is required. In 
Lemma A.l in Appendix A, we show that if Mj samples 










are taken of each of the kj |0}- and |+}-experiments, 

1 


Pmax(^fj) — 


^y2^TMj2 M 3 


Perror (k j A j . 


(V.6) 


This is a stronger bound than what appears in Higgins 
et ah, which is derived from Hoeffding’s bound. This 
stronger bound in turn allows us to obtain a better an¬ 
alytic bound on the variance of our final estimate. 

To calculate the variance of our estimate, we note that 
if no errors occur in our principal range estimates for all 
kj < kh , then the maximum error in our estimate is 

Ojr 

m = w (v.7) 

Furthermore, even if we have no errors in our principal 
range estimates, our final estimate can still differ from 
the true value by at most 

_ 2 7 r 

m (v. 8 ) 


Thus, we can bound the variance of our estimate A of 
A with 


a 2 (A) <(l - Pe r ioi(k K A))^(K ) 2 

K j- 1 

+ ^ (.7) PevrorikjA) (1 Perror(^i A)) 

3 =1 i=1 

K 

< (1 +^eO-) 2 Pmax(M,). 

3 =1 

(V.9) 

Note that the first term is a variance contribution from 
the event of no errors whereas the second term is the 
contribution in the event where^crrors arise. 

We assume that running the fc ■ |0)- or |+}-experiment 
takes time kj. Then the total time required for our esti¬ 
mate is 

K 

T = 2^2 j - 1 Mj. (V.10) 

j =i 


As in [10], setting 6 mj (ct 2 (A)T 2 ) = 0, we find Heisenberg 
scaling can be attained by setting 

Mj=a(K-j) + p (V.ll) 

for a,/3 € Z + . The sum in Eq. (V.9) can be performed 
by making the replacement p max (Mj) < -j==^-. One 
finds that a > 2 is necessary to prevent the sum from 
growing faster than ~ 4~ K , which results in poorer-than- 
Heisenberg scaling. We obtain 

1 + Pmax(/3) ^3 + ^ ) 

T < 2 K+1 (a + /3), 

a(A)T < 2tt( a + /3)^1+ p max (/3) ^3 + > 

(V.12) 


* 2 (A) < 


4 K 


which holds for all K >0. 

Thus Heisenberg scaling can be obtained for any a > 
2, j3 > 0. Optimizing Eq. (V.12) over the integers gives 
cr(A)T < 12.4tt at a = 3, /3 = 1. Better bounds of 
<j{A)T < 10.77T can be attained at a = 5/2, j3 = 1/2, 
where fractional values of Mj means one rounds up to 
the nearest integer value and performs that many ex¬ 
periments. This improved bound also uses a more so¬ 
phisticated analysis of Eq. (V.9), in which we pull out 
the last j = K, K — 1, ...,K — z terms from the sum in 
Eq. (V.9), and use p max (M J ) < y j 2 ^^ K _^u j » for values 

of j < K — z to transform the remainder into a geomet¬ 
ric sum. These analytic bounds are significant practical 
improvements over those in [10] where a(A)T < 547r at 
a = 8 In 2, 0 = 23/2. 

We compare our result to the scaling of various other 
phase estimation procedures (including maximum likeli¬ 
hood and procedures using entanglement) in Appendix 
B. While the improved analysis of this section gives us 
better analytic scaling than was previously known for 
non-adaptive phase estimation, our main motivation is 
to obtain better results in the presence of additive er¬ 
rors. The new analysis allows us to include much larger 
additive errors than would have been possible previously. 


B. Including additive errors 


We now consider the case that the success probabilities 
of our experiments differ from the ideal probabilities by 


constant factors 6o(kj) and 5+(kj) as 


, . , . 1 + cos kjA . . 

Po{A,kj)= 2 +S 0 (kj) 

(V.13) 

. . , , 1 + sin kjA . ,, . 

P+(A,kj)= 2 +M fc j)- 

(V.14) 

Let 


Sj =max{|clo(%)|,|M%)|}- 

(V.15) 


Suppose we use exactly the same procedure to estimate 
A as in the case of no additive errors. Then in Lemma 
A.2 in Appendix A we show that now, 


(V. 




>P error (kjA), 


where p elIor (kjA) is defined in Eq. (V.5). 

Now consider replacing Mj by F(Sj,Mj ) x Mj, where 

F{Sj,Mj) is 


log^q-V&j) 1 /^) 


l°g(l - i(l - v^) 2 ) 


(V.17) 


Then as long as Sj < l/y/8 ~ 0.354, we have 


Pma,x{F(6j,Mj)Mj,6j) < 


yj2nMj2 M 3 


(V.18) 
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This bound is the same as Eq. (V.6). This means that by 
increasing the number of samples of the j th experiment 
by a factor F(Sj,Mj), we can get the same error bounds 
as if there were no additive errors So{kj) and <5+(fcj). 

Suppose there is some smallest h such that 6h > 1/78. 
In this case, no matter how many times we repeat the 
experiments, no matter how many samples we take, 
Perror(h) will not be bounded. However, we can still use 
the procedure of the previous section to obtain an es¬ 
timate of A with variance proportional to 4~( h ~ 1 \ by 
using F(Sj,Mj)Mj samples for each j < h — 1, proving 
the second part of Theorem 1.1. 

Furthermore, if 

sup<5j = 1/78, but max 5j £ 1/78, (V.19) 

3 * 


then we can always increase the number of samples taken 
of each experiment in order to counteract the effect of 
additive errors. This means that we can obtain arbitrar¬ 
ily accurate estimates. However, the size of the required 
F(5j, Mj) blows up, so we will no longer have Heisenberg 
scaling. 

However, if sup^ Sj < 1/78, then for all j, we have 

Sj < 1/78 — e =. S' (V.20) 


for some constant e. Then if we take FjMj samples of 
the j th iteration, where 


log (1(1 - 78<5') 1/M 7 

log (l - 1(1 - 78<5') 2 3 4 ) ’ 


(V.21) 


we can attain the correct bounds on p erlOT . If we set 
Mj = a(K — j) + 13 as before, Mj is a monotonically 
decreasing sequence in j , so Fj is a monotonically in¬ 
creasing sequence. Thus, we have Fj < Fk for all 
j = 1,2,... ,K. 

If for each Mj we replace Mj by FjMj , we have in¬ 
creased the total time required by the procedure by at 
most a constant factor Fk , and obtained at least as good 
a terror at each step as in the case without any errors 
<5o (kj) or 5+(kj). Thus we can obtain Heisenberg scal¬ 
ing, where To a increases by the constant Fk compared 


to the case without additive errors <5o(7) or <7 (7)- This 
completes the proof of Theorem 1.1. 


VI. CONCLUSIONS AND OPEN PROBLEMS 


There are many ways to extend and refine the ideas of 
this paper. In particular, while the techniques described 
here seem to apply broadly for single-qubit operations, 
it would be both interesting theoretically and of great 
practical use if these procedures could be extended to 
multi-qubit operations. 

Additionally, there is much room for improvement in 
terms of error analysis. In this work, we’ve suggested 
treating depolarizing or amplitude damping noise as con¬ 
tributing to additive errors. However, this is essentially 
a worst-case scenario, in which every process adversari- 
ally drives you away from the desired state by as much 
as possible. In reality, we would expect the repeated 
applications of the gate to have a twirling effect, thus 
mitigating, or at least averaging, the effect of noise, as 
in randomized benchmarking [16]. In addition it would 
be of practical relevance to analyze the case where 6a 
and 6a are not fixed, but shift over time. 

Finally, at least on the surface, our procedure has 
many similarities to randomized benchmarking: both 
procedures are (more or less) robust to SPAM errors, and 
involve applying increasingly lengthy sequences of oper¬ 
ations. These similarities draw the question: is there an 
explicit connection between phase estimation and ran¬ 
domized benchmarking? 
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Appendix A: Bounds on p error 


In this section, we bound the probability of making an 
error at any step during our estimation procedure. An 

th . 

error occurs at the j iteration if 

(kMj - A) > |) \/ (M^i - A) < . (A.l) 

In the below analysis, we replace kjA with the variable 
(p and kjA with pi. In Lemma A.l, we consider the case 
without additive errors So(kj) and 5+(kj), and in Lemma 
A.2 we include these errors. 


Lemma A.l. For ip G (—7r,7r] let 

1 + cos(<^) 

~~2 : 

■ sin(</?) 


P o = 


1 


P+ = 


(A.2) 

(A.3) 


and let do (respectively a +) he drawn from the binomial 
distribution B(M, po) (resp. B(M,p + )). Let 


2 2 
ip = atan2 [ —d + — 1, — a 0 — 1 
' M + M 


(A.4) 


be an estimate for p (and if ao = ay = M/2, then (p is 
chosen uniformly at random from (— n, tt\). Then 


PellM < V2^M2 M 


(A.5) 






11 




(a) (b) 

FIG. 1: In Fig. la we show how to calculate ip given do and a + . Note do and d + can take values in {0,1,..., M}, so the blue 
circular dots represent the possible outcomes (ao, d+). In Fig. lb, we consider the case that p lies along the orange line in the 
upper right quadrant, corresponding to the maximum value of p e rror- In this case, all of the points with red square markers 
correspond to errors. We sum the probability of being at one of these points by first calculating the probability of being at 
one of the points intersected by the green dashed lines. 


where 


Perror(p) — P 


7T/2) \J{p - p < -tt/2) , 

(A.6) 


and the probability is taken over the possible outcomes 
do and d+. 

Proof. While Hoeffding’s inequality gives a loose bound 
on terror (<p), we will use a geometric interpretation to 
obtain a stronger and asymptotically exact result. In 
particular, we can extract an estimate p for p graphically 
by plotting the value of do and d_|_ on orthogonal axes, 
as shown in Figure la. 

Before we take advantage of this geometric interpre¬ 
tation, we first will show that p e rr0 r(p) is largest when 
p = 7t/4, and thus we need only analyze Perror(7 t/4). 

We introduce the substitution y = jj d+ — 1, x = 
■jjcio ~ 1 and consider the inner product 

f = (x, y).(cos ip, sinp). (A.7) 

Note that p e rr or(<p) corresponds to the probability that r 
is less than 0 (with some small correction because of one 
sided error). In the limit of very large M, f becomes a 
weighted sum of two independent normal distributions, 
and is hence a normal distribution itself. As normal 
distributions are completely characterized by their mean 
and variance, in this limit, p e rror{p ) depends only on the 
mean and variance of f. In particular, p e rr 0 r(p) will be 
largest when the mean of this distribution is smallest 
and the variance is largest. 


Using the well-known properties of binomial distribu¬ 
tions and properties of sums of independent distribu¬ 
tions, we have 


Effl = 1 , 


Var[f] = 2 M sin2 ( 2 ^' 


(A.8) 


Thus the variance of f and hence the probability of error 
is largest when p = 7t/4 + qn/2 for any integer q. When 
M is not large, we verify (see Figure 2) that p error (<p) is 
indeed largest at p = 7r/4. 

This leads to a drastic simplification — we need only 
bound Perror (tt/4). (Perror (p) fol' tp = 7r/4 + qw/2 is the 
same as p = 7t/ 4 by symmetry.) This corresponds to p 
lying along the orange line in Figure lb. Then an error 
occurs when values of ao and d+ correspond to the red 
square markers on Figure lb. Thus to bound p er ror(7r/4), 
we calculate the probability of ending up at any one of 
the red markers. We do this by summing over the cases 
where (do + a + ) is constant and no greater than M, 
corresponding to the dashed green lines in Figure lb. 

For p = 7t/ 4, we have po = p+ = p = (2 + \/2)/4, and 
the probability of finding do = ao and d_|_ = a+ is 


P[a 0 ,a+\ 



1 ~P 


ao+a+ 


(! ~P) 


2 M 


(A.9) 
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FIG. 2: Exact probability of error as a function of i p by 
enumeration over all possible outcomes a o and a+ that lead 
to errors in (p defined in Eq. (A. 6). Different lines correspond 
to the labeled number of repeats M = 1,2,... from the top. 
Observe that the maximum occurs at <p = 7t/4 for all M. 


The probability of lying on a line do + d+ = b is 

M 

-Pdiag(d) = ^2 P [a 0 , b - o 0 ] (A. 10) 


a 0 =0 

2M 

b 


p 


1 -p 


(! ~P) 


2M 


Summing over the lines of constant (ao + a+) up to M —1 
and including half of the line (ao + a+) = M, we have, 

M 

Pe rror(7T/4) = ^ P diag(d) - -P dia g(M) (A.11) 


l ” ( M !) 2 


H M 


(2 M)\ 

8 M ( M !) 2 


H M. 


2-^ N 

2 + V2, 


1 

P 

l-f 


where 


which is tight in the limit M —> oo. 

□ 


We now include additive errors into the analysis: 

Lemma A.2. For ip £ (—7r,7r] and So, <5+ such that 
|d 0 |, |<5+1 < S < l/y/8, let 


_ 1 + cos(<p) 

Po —- 2 -^ ° 0 ’ 

l + sin(v?) , r 
P+ = -o- 


(A.16) 
(A.17) 


and let do (d+) be drawn from the binomial distribution 
B{M,p 0 ) ( B(M,p + )). Let 


ip = atan2 


M a + lj M a ° 1 


(A.18) 


be an estimate for ip (and if ao = a+ = M/2, then <p is 
chosen uniformly at random from (—it, n\). Then 


Perror(ip, d+, 5— ) A 


where 


i p-pi-ca) 2 ) 


M 


2t r 1 - 


y/M 


(A.19) 


Perror(<P, S+,S-) = P (<p ~ p > 7r/2) \J (tp - ip < — 7r/2) 

(A.20) 


and the probability is taken over the possible outcomes 
ao and d+. 

Proof. This proof will be similar to the proof of Lemma 
A.l, so we will omit some of the details if they parallel 
the previous result. As done in Lemma A.l, we introduce 
the substitution x — -^do — 1, y = jjd + — 1 and consider 

r = (i,y).(cos<p,sin<p). (A.21) 

We find in this case that in the limit of large M, 


H(M , z) 


M 


E 

tc—0 


(. M\) 2 z x 

(M — x)\(M + x)!' 


(A.12) 


As the x = 0 term is 1 and the ratio of successive terms 
in H(M, z) is 


M — x 

--TV- Z < Z, 

1 + M + x 


(A.13) 


we can bound this sum with a geometric series: 

M 

ff(M,*)<£V<— . (A.14) 

x—0 


Using Stirling’s approximation n! ~ \[2jm(n/e) n and 
noting that the fractional error of the approximation de¬ 
creases monotonically with n, we obtain the remarkably 
simple bound 


Perror (p) A 


\Z2ttM2 m ’ 


(A.15) 


E [f] = 1 + 2(do cosiy? + d+ simp), (A.22) 

Tr 1 — (cos ip(2So + cost/?)) 2 — (sin <p(25+ + sin ip)) 2 

Var [r] =- — - 

As explained in the proof of Lemma A.l, p e rror is maxi¬ 
mized when we simultaneously minimize r’s expectation 
and maximize its variance. Using | <5 q |, |<5+| < <5, we have 


E [r] > 1 + V§>5 cos (tp — s), 


(A.23) 


Var [r] < E ^1 — cos ip 2 min 1, (2d + \Pl cos s cos tp) 


- sin ip mm 


1, (2d + v^sins sin ipY 


where s = it Q + |) , j = 0,1,2, 3 is used to represent 
the signs of do and d + . Thus, the worst-case bounds 


E [f] > 1 - V8S, 

Var[r]<E(i_ ( _L_ 2 d) 2 ), (A.24) 
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are obtained when 5o = 5+ = —S (corresponding to 
s = 7r + 7 t/ 4) and p = 7r/4, leading to po = p+ = 
p = (2 + V2)/4 — <5. We thus have p e noi(p, 5+, <5_) < 
terror (^" / 4, <5, 5). 

The bound on p e rror(7r/4, —5, —5) is then obtained by 
a calculation identical to the proof of Lemma A.l from 
Eq. (A.9) onwards, except with p = (2 + %/2)/4 — <5. We 
obtain 


Perror(^/4, <5, 5) — 


(2 M)\ 
4 M (M!) 2 


1- 


i (i - v^y 


M 


x H M. 


2 - ^ + 45 ' 


< 


2 + a/ 2 — 4(5 y 2y 

1 (1-±(1-V85) 2 ) 


M 


v^Fi-%/8<5 




(A.25) 


Observe that Eq. (A.15) is recovered in the absence of 
additive errors (i.e. when 5 = 0). □ 


where expectation over success and failure is taken. 
As the 2 Mj repeats of the experiment are indepen¬ 
dent, the total information obtained over all values of 
k is / = I(A 1 kj)2Mj. In the large K limit, 

I = |4 X (3 j3 + a). Using the Cramer-Rao inequality [11] 
then bounds the variance of A obtained via any unbi¬ 
ased estimator, such as maximum likelihood estimation, 
by cr 2 (A) > F _1 . Thus we obtain 

<R2 > 

At the settings of a = 5/2, /? = 1/2, we obtain cr(A)T > 
2.07T, which is about five times smaller than that ob¬ 
tained through Eq. (V.9). 

While maximum likelihood is a reasonable approach 
for standard phase estimation, once additive errors are 
included, we no longer have an unbiased estimator, so 
in this setting is unfair to compare our bound to that 
of the Cramer-Rao bound. Once additive errors are in¬ 
cluded, we do not have an appropriate lower bound on 
the scaling. 


Appendix B: Scaling of Phase Estimation 
Procedures 


Appendix C: Initial Bounding Techniques 


In Section V A, we gave an analytic bound on the 
scaling of our Heisenberg-limited phase estimation tech¬ 
nique. Optimizing Eq. (V.9) gave cr(A)T < 10.7n. 

This upper bound on the Heisenberg scaling constant 
should of course be compared to lower bounds. A num¬ 
ber of lower bounds are commonly cited in the literature, 
depending on the specification of allowed resources. The 
best possible bound is er(A)T >1 [5], often used in the 
atomic clocks community [18]. The resources required 
are similar to those used for our scheme, except that 
there is no iteration from j = 1,..., K — 1, so only the 
largest K experiment is used. However, achieving this 
bound is only possible when the principle range of A is 
known - a reasonable assumption when tracking well- 
characterized frequencies, but not when A is completely 
unknown. 

The next largest bound on the scaling is er(A)T > 
7 r [3], which is achievable using quantum phase esti¬ 
mation. Unlike the above case, A can be completely 
unknown initially. However, this scheme requires the 
resource of entanglement between different experimen¬ 
tal runs with multi-qubit gates, or non-local measure¬ 
ments [28]. Such requirements are technically demand¬ 
ing, which motivates entanglement-free schemes. 

Reasonable lower bounds for the entanglement-free 
scenario can be derived, but proving whether they are 
achievable remains an open question. For each experi¬ 
ment at some kj , (with kj as in Section V) , the amount 
of information we obtain about A can be quantified by 
the Fisher information 


I(A,kj) = E 


f dlogp(A,kj) 
^ dA 



(B.l) 


Our single-qubit calibration procedure works only 
when the errors are below a certain initial size. Here we 
show how the initial size of these errors can be bounded 
by conducting the appropriate experiments. 


1. Bounding e and 9 


In Section IIIB, we showed that we can estimate e and 
8 at the Heisenberg limit as long as e 2 and 8 2 are not 
too large. Here, we give a procedure to bound the initial 
size of e and 8. 

Let 

q 0 = |(0|X 7r/4 M) 4 |0>| 2 . (C.l) 

By direct calculation, we have 

go = sin 2 (0) + cos 2 (0) sin 2 ' (C.2) 

The maximum value 8 can attain is found by setting 
e = 0. This gives us 


|0| < arcsm-y^To. (C.3) 

Likewise, the maximum value e can attain is found by 
setting 8 = 0. This gives us 


|e| < 


2 arcsin ^yqo 
t'K 


(C.4) 


Now we just need to bound go- Using Hoeffding’s 
bound, if we make V observations of go, we can obtain 
an estimate go for g 0 such that 


P(g 0 < g 0 + /i) >1 - exp[—2V/z 2 ]. (C.5) 
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Thus we have 

\0\< arcsin \J q 0 + p, 

2 arcsin ^q 0 + p 

|e| <- s - (C-6) 

with probability 1 — exp[—2 V p 2 }. 

2. Bounding Measurement Error 

In this section, we show how to bound <S|o)(o|,w of Eq. 
(IV. 13), given access to W, the faulty measurement op¬ 
erator, and the ability to prepare the states | 0 )( 0 | and g 
where g is ideally close to | 1 )( 1 |. 

Consider the following measurements: 

Go = tr(W| 0 )( 0 |) 

Gi =ti(Wg). (C.7) 

Suppose V observations are made of each variable Go 
and Gi, producing estimates Go and Gi of the respective 
variables. Then using Hoeffding’s Bound, we have that 


where Vo = I, V\ = V x , V 2 = V y , and Vo = V z . Ad¬ 
ditionally W and g must be positive semidefinite, and 
0 < tr {Wp) < 1 for all p. 

Using Eq. (C.9) and Eq. (C.13), we have 


1 > mo + m 3 > G 0 , 

(C.14) 

3 

0 < ^2 miri < . 

2—0 

(C.15) 

We will use Eq. (C.14) to upper bound the size of mi 
and m 2 . The eigenvalues of W must lie in the range [0,1]. 
Explicitly evaluating the eigenvalues of W, and requiring 
that they are in this range gives 

0 < m 2 + m 2 < (1 — mo) 2 — m 2 . 

(C.16) 

Using Eq. (C.14), we have 


1 — mo > m 3 > Gq — mo. 

(C.17) 

Thus we can write 


m 3 = / — m 0 

(C.18) 


P(G 0 > G 0 — p) >1 — exp[— 2V p 2 } 

P(Gi < Gi + p) >1 — exp[—2Vp 2 }. (C. 8 ) 

We will show that if 


G 0 >G 0 — p = G 0 and, 

Gi <Gi +/x = G+, 

(C.9) 

then 


djv <Ai + -\/Aj + A|/2, 

(C.10) 

where 


A (Gq-) 2 -(G+) 2 -3G 0 --2G 0 -G+ 

-Gi + 2 

2(G 0 - - G+) 


A 2 =2(1 - Gq"). 

(C.ll) 


By the union bound, we have 


for some Gq < / < 1. Plugging Eq. (C.18) into Eq. 
(C.16) and taking the derivative with respect to m 0 , we 
find 


0 < m\ + m\ < (1 — /) 2 . (C.19) 

Since Gq < / < 1, we finally have 

0 < ml + m 2 2 < (1 - Gq) 2 , (C.20) 

so 

|mi|, |to 2 | < 1 - Gq . (C.21) 

Using Eq. (C.14), and that 1 — r 3 > 0, we have 


m 3 > 


> 


G 0 - mp - m 3 r 3 
1 ~r 3 

Gq -G+-(1-Go)(N 


fa 1) 


l - r 3 


(C.22) 


P (fa < Ar + ^A 2 1 + A%/2 S j > 1 - 2exp[-2U/r 2 ]. 

(C.12) 


One can verify that if G 0 ~ 1 and Gi « 0 , and /x <gc 1, 
then Ai and A 2 are small and hence 8w is small. 

Since Pauli operators are an orthonormal basis for 
Hermitian operators, we can write 


w = Y, m ^ 

2 — 0 

Q= \ [vo + ^riVi\ , 


where in the second line, we have used Eq. (C.21) 
Assuming Gq ~ 1 and Gf « 0, the numerator of Eq. 
(C.22) will be positive. Using the positive semidefinte 
constraint on g , we have r 3 > — \J 1 — r\ — r 2 , so 


m . G 0 --G+-(l^G 0 -)(|r 1 | + |r 2 |) 

3 1+y/l-ri-r 2 

We always want to choose rq = r 2 . If ri r 2 , we can 
replace ri and r 2 by their average, thereby preserving 
the numerator while increasing the denominator. Thus 

G 0 --G+ —2(1-G 0 -)H 
3 1 + y/T^? 


(C.13) 


(C.23) 
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We now minimize the right hand side of Eq. (C.23) 
with respect to ri, (assuming we are in a regime where 
Gq ~ 1, and Gf « 0,) giving 


m 3 > 


2-(2-G 0 -) 2 ^G+(2Gq 

2(G 0 - - G+) 



(C.24) 


At this point, we can bound the error that results from 
using W instead of the ideal |0)(0|. For an arbitrary state 
uj such that 



Vo 


3 


(C.25) 


we have 


| tr(Ww) - tr(|0)(0|w)| < Ai(l + w 3 ) + A 2 J 1 U ’ 3 

(C.26) 

with Ai and A 2 given by Eq. (C.ll), and we have used 
the trick of replacing uq and uq by their average. Max¬ 
imizing (C.26) with respect to w 3 we have 


|tr(Ww) - tr(|0)(0|w)| < A x + yj. A 2 + A 2 /2. (C.27) 


as claimed. 






